Bias , Variance , and Arcing Classifiers
نویسنده
چکیده
Recent work has shown that combining multiple versions of unstable classifiers such as trees or neural nets results in reduced test set error. To study this, the concepts of bias and variance of a classifier are defined. Unstable classifiers can have universally low bias. Their problem is high variance. Combining multiple versions is a variance reducing device. One of the most effective is bagging (Breiman [1996a]) Here, modified training sets are formed by resampling from the original training set, classifiers constructed using these training sets and then combined by voting. Freund and Schapire [1995,1996] propose an algorithm the basis of which is to adaptively resample and combine (hence the acronym-arcing) so that the weights in the resampling are increased for those cases most often missclassified and the combining is done by weighted voting. Arcing is more successful than bagging in variance reduction. We explore two arcing algorithms, compare them to each other and to bagging, and try to understand how arcing works.
منابع مشابه
Weighted Proportional k-Interval Discretization for Naive-Bayes Classifiers
The use of different discretization techniques can be expected to affect the classification bias and variance of naive-Bayes classifiers. We call such an effect discretization bias and variance. Proportional kinterval discretization (PKID) tunes discretization bias and variance by adjusting discretized interval size and number proportional to the number of training instances. Theoretical analys...
متن کاملVariance reduction trends on 'boosted' classifiers
Ensemble classification techniques such as bagging, (Breiman, 1996a), boosting (Freund & Schapire, 1997) and arcing algorithms (Breiman, 1997) have received much attention in recent literature. Such techniques have been shown to lead to reduced classification error on unseen cases. Even when the ensemble is trained well beyond zero training set error, the ensemble continues to exhibit improved ...
متن کاملArcing the edge
Recent work has shown that adaptively reweighting the training set, growing a classifier using the new weights, and combining the classifiers constructed to date can significantly decrease generalization error. Procedures of this type were called arcing by Breiman[1996]. The first successful arcing procedure was introduced by Freund and Schapire[1995,1996] and called Adaboost. In an effort to e...
متن کاملOn Why Discretization Works for Naive-Bayes Classifiers
We investigate why discretization is effective in naive-Bayes learning. We prove a theorem that identifies particular conditions under which discretization will result in naiveBayes classifiers delivering the same probability estimates as would be obtained if the correct probability density functions were employed. We discuss the factors that might affect naive-Bayes classification error under ...
متن کاملBias and Variance of Rotation-Based Ensembles
In Machine Learning, ensembles are combination of classifiers. Their objective is to improve the accuracy. In previous works, we have presented a method for the generation of ensembles, named rotation-based. It transforms the training data set; it groups, randomly, the attributes in different subgroups, and applies, for each group, an axis rotation. If the used method for the induction of the c...
متن کامل